Detecting and Typing of Bacterial Strains

ABSTRACT

Methods for the detection and typing of bacterial strains from food products and dietary supplements, environmental samples, in vivo/in vitro samples, and for studying the natural diversity of the species are disclosed. Potential applications also include product development and/or detection and differentiation of new bacterial strains.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application Ser.No. 60/566,007, filed Apr. 28, 2004, the contents of which are herebyincorporated in their entirety by reference herein.

FIELD OF THE INVENTION

This invention relates to methods for detecting and typing bacterialstrains, specifically Lactobacillus strains.

BACKGROUND OF THE INVENTION

Rapid and accurate differentiation of bacterial strains is importantwhen making medical diagnoses, in epidemiological studies, and forstudying evolutionary diversity among bacteria. Various methods existfor typing or detecting bacterial strains, including RFLP,hybridization, and sequencing. Epidemiologically informativemicrosatellite DNA polymorphisms have been observed in different strainsof Helicobacter pylori (Marshall et al. (1996) J. Appl. Bacteriol.81:509-517). Similarly, repetitive DNA elements of Mycobacteriumtuberculosis have been used for efficient strain tracking (Van Soolingenet al. (1993) J. Clin. Microbiol. 31:1987-1995). In addition, shortsequence repeat (SSR) variation has been used to differentiate thestrains of Haemophilus influenzae isolated from different patients (vanBelkum et al. (1997) Infect. Immun. 65:5017-5027). However, currentmethods available to specifically differentiate bacterial strains, suchas Lactobacillus acidophilus strains, are based either on 16SrRNA genesequencing, which is only accurate to the species level, or on long anddifficult Pulsed Field Gel Electrophoresis (PFGE) procedures.

The CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat;also called SPIDR (Spacers Interspersed Direct Repeats), VNTR (VariableNumber of Tandem Repeats), SRVR (Short Regularly Variable Repeats), andSRSR (Short Regularly Spaced Repeats)) loci, described by Jansen et al.(2002) OMICS J. Integr. Biol. 6:23-33, constitute a novel family ofrepeat sequences that is present in Bacteria and Archaea but not inEukarya. The repeat loci typically consist of repetitive stretches ofnucleotides with a length of 25 to 37 base pairs alternated bynonrepetitive DNA spacers of approximately equal length. To date, CRISPRloci have been identified in more than forty microorganisms (Jansen etal. (2002) OMICS J. Integr. Biol. 6:23-33), but from the lactic acidbacteria, they have only been described from Streptococcus species.Despite their discovery over 15 years ago in E. coli (Ishino et al.(1987) J. Bacteriol. 169:5429-5433), no physiological function has yetbeen discovered. The nucleotide sequences of the repeats are generallyhighly conserved within a species, but show low similarity betweenspecies. It has also been shown that variability among CRISPR loci isnot due primarily to single nucleotide base changes, but rather todeletions/insertions of entire repeat and spacer regions. Theseproperties have led to the use of the CRISPR loci as a strain-typingtool in Mycobacterium (Groenen et al. (1993) Mol. Microbiol.10:1057-1065).

As methods to differentiate Lactobacillus bacteria, specifically L.acidophilus, are either not accurate to the strain level or aretechnically demanding, the development of new methods fordifferentiating Lactobacillus strains is desirable.

BRIEF SUMMARY OF THE INVENTION

Compositions and methods for detecting and typing bacteria are provided,particularly a Lactobacillus strain of bacteria, for example aLactobacillus acidophilus strain. Compositions of the invention includeisolated nucleic acid molecules from Lactobacillus acidophiluscomprising a region of DNA, preferably located between the genes for DNApolymerase I (polA) and a putative phosphoribosylamine-glycine ligase(purD), consisting of one or more copies of a repetitive DNA sequence ofabout 20-40 base pairs, such as about 25-35 base pairs or of about 27-30base pairs, interspersed with nonrepetitive spacer sequences of aboutthe same length. In one embodiment, the isolated nucleic acid moleculecomprises a 29 base pair sequence that is present 32 times, and isseparated by the same number of 32-base pair spacer sequences.

Compositions of the invention also include isolated nucleic acidmolecules from Lactobacillus brevis, Lactobacillus casei, andLactobacillus delbrueckii ssp. bulgaricus comprising repetitivesequences originally identified in a CRISPR region. In one embodiment,the isolated nucleic acid molecule comprises a 28 base pair sequencefrom L. brevis. In another embodiment, the isolated nucleic acidmolecule comprises a 28 base pair sequence from L. casei. In yet anotherembodiment, the isolated nucleic acid molecule comprises a 28 base pairsequence from L. delbrueckii ssp. bulgaricus.

Variant nucleic acid molecules sufficiently identical to the nucleotidesequences are also encompassed by the present invention. Additionally,fragments and sufficiently identical fragments of the nucleotidesequences are encompassed. Specifically, the present invention providesfor isolated nucleic acid molecules comprising one or more nucleotidesequences found in SEQ ID NOS:1-50. The present invention furtherprovides for isolated nucleic acid molecules comprising 1-140 repeats ofa nucleotide sequence of the invention, or a variant thereof. In someembodiments, the isolated nucleic acid molecules comprise more than 5repeats, more than 10 repeats, less than 50 repeats, or less than 35repeats of a nucleotide sequence of the invention, or a variant thereof.Compositions also include PCR primers for amplifying this region in aLactobacillus species, including L. acidophilus, L. brevis, L. casei andL. delbrueckii. Nucleotide sequences that are complementary to anucleotide sequence of the invention, or that hybridize to a sequence ofthe invention, are also encompassed. Further are included methods andkits for detecting the presence of a nucleic acid sequence of theinvention in a sample, and methods and kits for typing bacteria,including Lactobacillus strains, particularly L. acidophilus, L. brevis,L. casei and L. delbrueckii strains.

Methods for typing a bacterium having a CRISPR region are provided. Themethods comprise obtaining a sample comprising the bacterium; amplifyinga region of DNA comprising the CRISPR region or a fragment thereof inthe sample to create amplified DNA; adding to the amplified DNA at leastone restriction enzyme that recognizes one or more sites in theamplified DNA; incubating the restriction enzyme with the amplified DNAfor a time sufficient to form restriction fragments; determining thenumber of the restriction fragments and their size; and typing thebacterium based on the number and size of the restriction fragments.

A method for typing a Lactobacillus bacterial strain is also provided.The method comprises obtaining a sample, amplifying a region of DNAcomprising at least one of the nucleotide sequences set forth in SEQ IDNOS:1-7 and 37-48, or a variant thereof, in the sample to createamplified DNA, and typing the bacterial strain based on the amplifiedDNA. The methods may further comprise adding to the amplified DNA atleast one restriction enzyme that recognizes one or more sites in theamplified DNA, incubating the restriction enzyme with the amplified DNAfor a time sufficient to form restriction fragments, determining thenumber of the restriction fragments and their size, and typing thebacterial strain based on the number and size of the restrictionfragments. Alternatively, the methods may further comprise sequencingthe amplified DNA to obtain sequencing results, and typing the bacterialstrain based on the sequencing results. In one embodiment, theLactobacillus is L. acidophilus.

The amplified DNA may be obtained by providing a first primer that bindsto a repetitive sequence in a CRISPR region, providing a second primerthat binds to DNA flanking (i.e., upstream or downstream of) the CRISPRregion, using the primers in a PCR reaction to create amplified DNA,separating the amplified DNA on a gel to produce a distinct band patternshowing the number and sizes of the amplified DNA, and typing thebacterial strain based on the band pattern. The number and sizes of thebands are characteristic of the strain. The amplified DNA mayalternatively be obtained by providing a first primer that binds to aregion of DNA upstream of the CRISPR region, and a second primer thatbinds to a region of DNA downstream of the CRISPR region, using theprimers in a PCR reaction to create amplified DNA, separating theamplified DNA on a gel to produce a band showing the size of theamplified CRISPR DNA, and typing the bacterial strain based on the bandsize. The size of the amplified DNA is characteristic of the strain.

Methods for detecting the presence of a Lactobacillus species in asample are provided. The methods comprise obtaining a sample, amplifyinga region of DNA comprising at least one of the nucleotide sequences setforth in SEQ ID NOS:1-7 and 37-48, or a variant thereof, to createamplified DNA, and detecting the amplified DNA. The methods may furthercomprise adding to the amplified DNA at least one restriction enzymethat recognizes one or more sites in the amplified DNA, incubating therestriction enzyme with the amplified DNA for a time sufficient to formrestriction fragments, determining the number of the restrictionfragments and their size, and detecting the presence of a Lactobacillusspecies based on the number and size of the restriction fragments.Alternatively, the methods may further comprise sequencing the amplifiedDNA to obtain sequencing results, and detecting the presence of aLactobacillus species based on the sequencing results.

The methods of the present invention are useful for the detection andtyping of bacterial strains, including Lactobacillus strains such as L.acidophilus, L. brevis, L. casei, and L. delbrueckii strains, in foodproducts and dietary supplements, including animal feed and animal feedsupplements, in in vivo/in vitro samples, and for studying the naturaldiversity of the species from environmental samples. The methods arealso useful for product development and identification of new bacterialstrains, particularly Lactobacillus strains.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an intergenic region in Lactobacillus acidophilus havingfeatures of a CRISPR locus.

FIG. 2 shows the nucleotide sequences of the repeat regions in theintergenic region (SEQ ID NO: 1). The 29-bp repeats are highlighted (SEQID NOS:2-7). An imperfect inverted repeat is indicated by an underlineon the last repeat. The spacer regions (SEQ ID NOS:8-35) and a flankingregion (SEQ ID NO:36) are not highlighted. Two sequences are repeated inthe spacer region; one is repeated twice (outlined and bold) (SEQ IDNO:15) and one is repeated three times (caps and bold) (SEQ ID NO:13).

FIG. 3 consists of micrographs of various agarose gel electrophoresisexperiments. FIG. 3A shows PCR products and FIGS. 3B, C, and D showrestriction fragment results. A. Lane M-1 Kb DNA ladder; Lane 1-NCFM®;Lane 2-Strain C; Lane 3-Strain D; Lane 4-ATCC 4356; Lane 5-Strain B;Lane 6-Strain E. B. Lane M-50 bp DNA ladder; Lane 1-NCFM®; Lane 2-StrainC; Lane 3-Strain D; Lane 4-ATCC 4356; Lane 5-ATCC 4357; Lane 6-Strain B.C. Lane M-50 bp DNA ladder; Lane 1-NCFM®; Lane 2-Strain C; Lane 3-StrainD; Lane 4-ATCC 4356; Lane 5-ATCC 4357; Lane 6-Strain B. D. Lane M-50 bpDNA ladder; Lane 1-NCFM®; Lane 2-Strain C; Lane 3-Strain D; Lane 4-ATCC4356; Lane 5-ATCC 4357; Lane 6-Strain B.

FIG. 4 is a micrograph of PCR products of the following strains: Lane1—L. acidophilus NCFM®; Lane 2—L. acidophilus Lac-1; Lane 3—L.acidophilus Lac-2; Lane 4—L. acidophilus Lac-3; Lane 5—L. acidophilusATCC 4355; Lane 6—L. acidophilus ATCC 4356; Lane 7—L. acidophilus ATCC4357; Lane 8—L. acidophilus ATCC 4796; Lane 9—L. helveticus ATCC 521;Lane 10—L. acidophilus ATCC 832; Lane 11—L. acidophilus ATCC 9224; Lane12—L. acidophilus ATCC 11975; Lane 13—L. acidophilus ATCC 314; Lane14—L. gasseri ATCC 43121; Lane 15—L. acidophilus Lac-4; Lane 16—L.acidophilus Lac-5; Lane 17—L. amylovorus ATCC 33198; Lane 18—L.gallinarum ATCC 33199; Lane 19—L. gasseri ATCC 33323; Lane 20—L.johnsonii ATCC 33200; Lane 21—L. crispatus Lcr-1; Lane 22—L. helveticusLhe-1; Lane 23-control (no DNA).

FIG. 5 is a micrograph of bands resulting from PCR amplificationfollowed by restriction digest of the following strains: Lane 1—L.acidophilus NCFM®; Lane 2—L. acidophilus Lac-1; Lane 3—L. acidophilusLac-2; Lane 4—L. acidophilus Lac-3; Lane 5-L. acidophilus ATCC 4355;Lane 6—L. acidophilus ATCC 4356; Lane 7—L. acidophilus ATCC 4357; Lane8—L. acidophilus ATCC 4796; Lane 9—L. acidophilus ATCC 832; Lane 10—L.acidophilus ATCC 9224; Lane 11—L. acidophilus ATCC 11975; Lane 12-L.acidophilus ATCC 314; Lane 13—L. acidophilus Lac-4; Lane 14—L.acidophilus Lac-5.

FIG. 6 is a micrograph showing pulsed field gel electrophoresis of L.acidophilus NCFM® (Lane 1); L. acidophilus Lac-1 (Lane 2); L.acidophilus Lac-3 (Lane 3); and L. acidophilus ATCC 4356 (Lane 4).

FIG. 7 shows a repeat sequence from L. acidophilus (Lac) (SEQ ID NO:37);L. brevis (Lbr) (SEQ ID NO:38); L. casei (Lca) (SEQ ID NO:45); and L.delbrueckii ssp. Bulgaricus (Lde) (SEQ ID NO:46). Variant nucleotidesand their positions are shown below the main sequence.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to methods and compositions for detectingand/or typing bacterial strains, such as Lactobacillus strains,including Lactobacillus acidophilus, L. brevis, L. casei, and L.delbrueckii. These methods can be used in medical and food safetydiagnostics, or in research. By “typing” or “differentiation” isintended the identification of the strain of a bacterium, includingidentifying that it is distinct from other strains based on itsnucleotide sequence (i.e., by analyzing the band pattern resulting fromrestriction enzyme digestion). By “detection” is intended theverification of the presence or absence of a species of bacteria in asample. Compositions of the invention include isolated nucleic acidmolecules from L. acidophilus, L. brevis, L. casei, and L. delbrueckiithat are part of a CRISPR locus. By “CRISPR region” or “CRISPR locus” isintended a repetitive stretch of nucleotide sequence, wherein therepeats are about 20 to about 40 base pairs in length, and arealternated by nonrepetitive DNA spacers of approximately equal size. Theacronyms CRISPR, SPIDR, VNTR, and SRVR have each been used to describe anucleotide sequence having interspaced repeats.

Additionally, the present invention provides methods and kits forbacterial typing that may be used to determine similarities and/ordifferences between bacterial strains, particularly Lactobacillusstrains, including L. acidophilus, L. brevis, L. casei, and L.delbrueckii, and methods and kits for detecting the presence or absenceof a Lactobacillus species in a sample. More particularly, the methodsinvolve a rapid, semi-automated method for the detection and/or typingof strains of prokaryotic organisms, such as L. acidophilus, in which aCRISPR DNA sequence is amplified and used to differentiate betweenstrains of Lactobacillus, or for the detection of a Lactobacillusspecies.

Isolated nucleic acid molecules of the present invention comprise thenucleotide sequences set forth in SEQ ID NOS:1-50, and variants andfragments thereof. The present invention also encompasses molecules thatare complementary to these nucleic acid sequences, or that hybridize tothese sequences.

The nucleic acid compositions encompassed by the present invention areisolated or substantially purified. By “isolated” or “substantiallypurified” is intended that the nucleic acid molecules, or biologicallyactive fragments or variants, are substantially or essentially free fromcomponents normally found in association with the nucleic acid in itsnatural state. Such components include other cellular material, culturemedia from recombinant production, and various chemicals used inchemically synthesizing the nucleic acids. Preferably, an “isolated”nucleic acid of the present invention is free of nucleic acid sequencesthat flank the nucleic acid of interest in the genomic DNA of theorganism from which the nucleic acid was derived (such as codingsequences present at the 5′ or 3′ ends). However, the molecule mayinclude some additional bases or moieties that do not deleteriouslyaffect the basic characteristics of the composition. For example, invarious embodiments, the isolated nucleic acid contains less than 5 kb,4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb, or 0.1 kb of nucleic acid sequencenormally associated with the genomic DNA in the cells from which it wasderived.

The compositions and methods of the present invention can be used todetect Lactobacillus species, including L. acidophilus, L. brevis, L.casei, and L. delbrueckii, or to type bacterial strains, including veryclosely related L. acidophilus strains, both in the laboratory and incommercial products. This is useful for product development, as well asfor research in bacterial species diversity and evolution, and in theidentification of new bacterial strains, including new Lactobacillusstrains.

Detection and Differentiation of Bacterial Strains

CRISPR loci are a distinct class of interspersed short sequence repeats(SSRs) that were first recognized in E. coli (Ishino et al. (1987) J.Bacteriol. 169:5429-5433; Nakata et al. (1989) J. Bacteriol.171:3553-3556). Similar interspersed SSRs have been identified inHaloferax mediterranei, Streptococcus pyogenes, Anabaena, andMycobacterium tuberculosis (Groenen et al. (1993) Mol. Microbiol.10:1057-1065; Hoe et al. (1999) Emerg. Infect. Dis. 5:254-263; Masepohlet al. (1996) Biochim. Biophys. Acta 1307:26-30; Mojica et al. (1995)Mol. Microbiol. 17:85-93). The CRISPR loci differ from other SSRs by thestructure of the repeats, which have been termed short regularly spacedrepeats (SRSRs) (Janssen et al. (2002) OMICS J. Integ. Biol. 6:23-33;Mojica et al. (2000) Mol. Microbiol. 36:244-246). The repeats are shortelements that occur in clusters, that are always regularly spaced byunique intervening sequences with a constant length (Mojica et al.(2000) Mol. Microbiol. 36:244-246). Although the repeat sequences arehighly conserved between strains, the number of interspersed repeats andthe sequences of the spacer regions differ from strain to strain (vanEmbden et al. (2000) J. Bacteriol. 182:2393-2401). Methods foridentifying CRISPR regions are well known in the art (see, for example,the above references, which are incorporated by reference herein intheir entirety, as well as the methods used in Example 1). The methodsof the present invention are herein exemplified by experiments involvingL. acidophilus; however, one of skill in the art would recognize thatthe methods may be used for detection and/or strain identification ofany bacterium having a CRISPR region.

The number of nucleotides in a repeat is generally about 20 to about 40base pairs, but may be about 20 to about 39 base pairs, about 20 toabout 37 base pairs, about 20 to about 35 base pairs, about 20 to about33 base pairs, about 20 to about 30 base pairs, about 21 to about 40base pairs, about 21 to about 39 base pairs, about 21 to about 37 basepairs, about 23 to about 40 base pairs, about 23 to about 39 base pairs,about 23 to about 37 base pairs, about 25 to about 40 base pairs, about25 to about 39 base pairs, about 25 to about 37 base pairs, about 25 toabout 35 base pairs, or about 28 or 29 base pairs. The number of repeatsmay range from about 1 to about 140, from about 1 to about 100, fromabout 2 to about 100, from about 5 to about 100, from about 10 to about100, from about 15 to about 100, from about 20 to about 100, from about25 to about 100, from about 30 to about 100, from about 35 to about 100,from about 40 to about 100, from about 45 to about 100, from about 50 toabout 100, from about 1 to about 135, from about 1 to about 130, fromabout 1 to about 125, from about 1 to about 120, from about 1 to about115, from about 1 to about 110, from about 1 to about 105, from about 1to about 100, from about 1 to about 95, from about 1 to about 90, fromabout 1 to about 80, from about 1 to about 70, from about 1 to about 60,from about 1 to about 50, from about 10 to about 140, from about 10 toabout 130, from about 10 to about 120, from about 10 to about 110, fromabout 10 to about 95, from about 10 to about 90, from about 20 to about80, from about 30 to about 70, from about 30 to about 60, from about 30to about 50, from about 30 to about 40, or about 32.

The nucleotide sequences disclosed herein may be used to detectLactobacillus species, and/or to differentiate bacterial strains,including differentiating L. acidophilus NCFM® strains from other L.acidophilus strains. The detection and/or differentiation is based onthe identification of novel CRISPR regions in L. acidophilus NCFM®, L.brevis, L. casei, and L. delbrueckii subspecies bulgaricus. As theseCRISPR regions are strain-specific, any method assaying for the presenceof these specific sequences is encompassed by the current invention. Thepresent invention is applicable to medical testing, food testing,agricultural testing, and environmental testing.

Diagnostic assays to detect the presence of a nucleic acid molecule in asample are disclosed. These methods comprise obtaining a sample,amplifying a region of DNA comprising at least one of SEQ ID NOS:1-7 and37-48, or a variant thereof, to create amplified DNA, and detecting theamplified DNA. Detection of amplified DNA is specific for aLactobacillus species. Different strains of a species of Lactobacillus,such as a L. acidophilus species, may have different sizes of amplifiedDNA. Therefore, this method may also be used as a tool for straindifferentiation. The method may further comprise sequencing theamplified DNA and detecting the presence of a Lactobacillus species,such as L. acidophilus, L. brevis, L. casei, or L. delbrueckii, based onthe sequencing results. Alternatively, the method may further compriseadding to the amplified DNA at least one restriction enzyme thatrecognizes one or more sites in the amplified DNA, incubating therestriction enzyme with the amplified DNA for a time sufficient to formrestriction fragments, determining the number and size of therestriction fragments, and detecting the presence of a Lactobacillusspecies, such as L. acidophilus, L. brevis, L. casei, or L. delbrueckii,based on the number and size of the restriction fragments.

Methods for typing a Lactobacillus bacterial strain are provided. Thesemethods comprise obtaining a sample, amplifying a region of DNAcomprising at least one of the nucleotide sequences set forth in SEQ IDNOS:1-7 and 37-48, or a variant thereof, in the sample to createamplified DNA, and typing the bacterial strain based on the amplifiedDNA. This typing may be done by adding to the amplified DNA at least onerestriction enzyme that recognizes one or more sites in the amplifiedDNA, incubating the restriction enzyme with the amplified DNA for a timesufficient to form restriction fragments, determining the number of therestriction fragments and their size, and typing the bacterial strainbased on the number and size of the restriction fragments. Typing mayalso be done by sequencing the amplified DNA, and typing the bacterialstrain based on the sequencing results. In one embodiment, the region ofDNA to be amplified comprises SEQ ID NO:1. In another embodiment, theregion of DNA comprises a nucleotide sequence having at least 75%sequence identity to at least one of SEQ ID NOS:1-7 and 37-48.

The amplified DNA may be obtained by providing a first primer that bindsto a region of DNA flanking the CRISPR region, such as DNA upstream ofthe CRISPR region, and a second primer that binds to a region of DNAflanking the CRISPR region, such as DNA downstream of the CRISPR region;using the primers in a PCR reaction to create amplified DNA; separatingthe amplified DNA on a gel to produce a band showing the size of theamplified CRISPR DNA; and typing the bacterial strain based on the bandsize. The size of the amplified DNA is characteristic of the strain ofLactobacillus. In one embodiment, the first primer binds to a region ofDNA upstream of the CRISPR region, such as that set forth in SEQ IDNO:49, and the second primer binds to a region of DNA downstream of theCRISPR region, such as that set forth in SEQ ID NO:50.

Alternatively, the amplified DNA may be obtained by providing a firstprimer that binds to a repetitive sequence in a CRISPR region; providinga second primer that binds to DNA flanking (i.e. upstream or downstreamof) the CRISPR region; using the primers in a PCR reaction to createamplified DNA; separating the amplified DNA on a gel to produce adistinct band pattern showing the number and sizes of the amplified DNA;and typing the bacterial strain based on the pattern. The number andsizes of the bands are diagnostic of the strain of Lactobacillus. In oneembodiment, the first primer binds to any of SEQ ID NOS:2-7 and 37-48,and the second primer binds to DNA flanking any of SEQ ID NOS:2-7 and37-48.

This method wherein one primer binds to any of SEQ ID NOS:2-7 and 37-48produces a number of bands of varying sizes, depending on the number andspacing of the repeats in relation to the anchored primer. For example,if the repeat region is present five times, the primer complementary tothe repeat region will bind in five places and generate five bands thatmay be visualized as a fingerprint on an agarose or polyacrylamide gel.The PCR products may be amplified to different extents, and some of theresulting bands may therefore not be visualized as easily as others, ifat all. The distinct band pattern shows the number and size of theamplified DNA, and may be used to characterize Lactobacillus strains,including L. acidophilus, L. brevis, L. casei, and L. delbrueckiistrains.

The term “sample” is intended to include tissues, cells, and biologicalfluids present in or isolated from a subject, as well as cells fromstarter cultures (mother, seed, bulk/set, concentrated, dried,lyophilized, frozen), or food/dairy/feed products carrying suchcultures, or derived from the use of such cultures. The sample may be adietary supplement, bioprocessing fermentate, or a subject that hasingested a substance comprising the nucleotide sequence. That is, thedetection method of the invention can be used to detect genomic DNAcomprising a disclosed nucleotide sequence in a sample both in vitro andin vivo. In vitro techniques for detection of genomic DNA comprising thedisclosed nucleotide sequences include, but are not limited to, Southernhybridizations. Results obtained with a sample from the food,supplement, culture, product, or subject may be compared to resultsobtained with a sample from a control culture, product, or subject. Inone embodiment, the sample contains genomic DNA from a starter culture.

Amplification of the desired region of DNA may be achieved by any methodknown in the art, including polymerase chain reaction (PCR). By“amplification” is intended the production of additional copies of anucleic acid sequence. This is generally carried out using PCRtechnologies well known in the art (Dieffenbach and Dveksler (1995) PCRPrimer, a Laboratory Manual (Cold Spring Harbor Press, Plainview, N.Y.).By “polymerase chain reaction” or “PCR” is intended a method such asthat disclosed in U.S. Pat. Nos. 4,683,195 and 4,683,202, hereinincorporated by reference, which describe a method for increasing theconcentration of a segment of a target sequence in a mixture of genomicDNA without cloning or purification. The length of the amplified segmentof the desired target sequence is determined by the relative positionsof two oligonucleotide primers with respect to each other, andtherefore, this length is a controllable parameter. By virtue of therepeating aspect of the process, the method is referred to as “PCR”.Because the desired amplified segments of the target sequence become thepredominant sequences (in terms of concentration) in the mixture, theyare said to be “PCR amplified.”

In a PCR approach, oligonucleotide primers can be designed for use inPCR reactions to amplify all or part of the CRISPR locus. By “primer” isintended an oligonucleotide, whether occurring naturally as in apurified restriction digest or produced synthetically, which is capableof acting as a point of initiation of synthesis when placed underconditions in which synthesis of a primer extension product which iscomplementary to a nucleic acid strand is induced (i.e., in the presenceof nucleotides and an inducing agent such as DNA polymerase and at asuitable temperature and pH). The primer is preferably single strandedfor maximum efficiency in amplification, but may alternatively be doublestranded. If double stranded, the primer is first treated to separateits strands before being used to prepare extension products. Preferably,the primer is an oligodeoxyribonucleotide. The primer must besufficiently long to prime the synthesis of extension products in thepresence of the inducing agent. The exact lengths of the primers willdepend on many factors, including temperature, source of primer, and theuse of the method. PCR primers are preferably at least about 10nucleotides in length, and most preferably at least about 20 nucleotidesin length.

Compositions of the invention include oligonucleotide primers that maybe used to amplify these repetitive regions. Examples of PCR primersthat may be used in the methods of the invention include primers thatbind to a region of genomic DNA flanking the CRISPR region, such asthose found in SEQ ID NOS:49 and 50, or a primer that binds to SEQ IDNO:36, or primers that bind within the CRISPR region, or a combinationthereof. The forward and reverse primers are designed to amplify all orpart of a CRISPR region. By “flanking” is intended a region 5′(upstream) or 3′ (downstream) of the sequence. In some embodiments, atleast one primer binds to a DNA sequence flanking the CRISPR region. Insome embodiments, one primer binds to the first repetitive sequence (forexample, SEQ ID NO:2) and one primer binds to a flanking DNA sequence(for example, SEQ ID NO:36), therefore amplifying the entire CRISPRregion. In some embodiments, both primers bind to regions of DNAflanking the CRISPR region. Primers that are designed to bind to DNAflanking a CRISPR region would be species specific, as this flanking DNAwould not be expected to share enough sequence identity between allLactobacillus species.

The repetitive sequences in these CRISPR regions show nucleotidehomology to each other (see FIG. 7). The L. acidophilus repetitivesequences are at least 86% identical to each other. The L. brevisrepetitive sequences are at least 82% identical to each other. The L.delbrueckii ssp. bulgaricus repetitive sequences are at least 89%identical to each other. The L. acidophilus repetitive sequences are atleast 57% identical to the L. brevis repetitive sequences. The L.acidophilus repetitive sequences are at least 71% identical to the L.casei repetitive sequences. The L. acidophilus repetitive sequences areat least 75% identical to the L. delbrueckii repetitive sequences. TheL. brevis repetitive sequences are at least 64% identical to the L.casei repetitive sequences. The L. brevis repetitive sequences are atleast 64% identical to the L. delbrueckii repetitive sequences. The L.casei repetitive sequences are at least 71% identical to the L.delbrueckii repetitive sequences.

When the DNA sequence flanking the CRISPR region is known, one of skillin the art would be able to design primers for amplifying the CRISPRregion based on this known flanking sequence. When the DNA sequenceflanking the CRISPR region is not yet known, one of skill in the artwould be able to determine this flanking sequence using methods known inthe art. The entire genome of L. acidophilus NCFM is provided in U.S.Provisional Application No. 60/622,712, and Alternann et al. (2005)Proc. Natl. Acad. Sci. U.S.A. 102:3906-3912, herein incorporated byreference in their entirety. The genome of L. plantarum is provided inKleerebezem et al. (2003) Proc. Natl. Acad. Sci. U.S.A. 100:1990-1995.The entire genome of L. johnsonii is provided in Pridmore et al. (2004)Proc. Natl. Acad. Sci. U.S.A 101:2512-2517.

Methods for designing PCR primers and PCR cloning are generally known inthe art and are disclosed in Sambrook et al. (1989) Molecular Cloning: ALaboratory Manual (2d ed., Cold Spring Harbor Laboratory Press,Plainview, N.Y.). See also Innis et al., eds. (1990) PCR Protocols: AGuide to Methods and Applications (Academic Press, New York); Innis andGelfand, eds. (1995) PCR Strategies (Academic Press, New York); andInnis and Gelfand, eds. (1999) PCR Methods Manual (Academic Press, NewYork). Known methods of PCR include, but are not limited to, methodsusing paired primers, nested primers, single specific primers,degenerate primers, gene-specific primers, vector-specific primers,partially mismatched primers, and the like.

With PCR, it is possible to amplify a single copy of a specific targetsequence to a level detectable by several different methodologies (e.g.,hybridization with a labeled probe; incorporation of biotinylatedprimers followed by avidin-enzyme conjugate detection; incorporation of³²P-labeled deoxynucleotide triphosphates, such as dCTP or dATP, intothe amplified segment). In addition to genomic DNA, any oligonucleotidesequence can be amplified with the appropriate set of primer molecules.In particular, the amplified segments created by the PCR process itselfare, themselves, efficient templates for subsequent PCR amplifications.

Amplification in PCR requires “PCR reagents” or “PCR materials,” whichherein are defined as all reagents necessary to carry out amplificationexcept the polymerase, primers, and template. PCR reagents normallyinclude nucleic acid precursors (dCTP, dTTP, etc.), and buffer.

Once the DNA comprising the CRISPR locus or a portion thereof has beenamplified, it may then be digested (cut) with a restriction enzyme. Asused herein, the terms “restriction endonucleases” and “restrictionenzymes” refer to bacterial enzymes, each of which cut double-strandedDNA at or near a specific nucleotide sequence. Restriction enzymes arewell known in the art and may be readily obtained, for example, fromvariety of commercial sources (for example, New England Biolabs, Inc.,Beverly, Mass.). Similarly, methods for using restriction enzymes arealso generally well known and routine in the art. Preferred restrictionenzymes are those that produce between 10 and 24 fragments of DNA whencutting the CRISPR locus (for example, SEQ ID NO: 1). Examples of suchenzymes include, but are not limited to, AluI, MseI, and Tsp5091.Fragments of DNA obtained using restriction enzymes may be detected, forexample, as bands by gel electrophoresis. Restriction enzymes may beused to create Restriction Fragment Length Polymorphisms (RFLPs). RFLPsare, in essence, unique fingerprint snapshots of a piece of DNA, whethera whole chromosome (genome), or a part thereof, such as the region ofthe genome comprising the novel L. acidophilus CRISPR locus disclosed inthe present invention.

RFLPs are generated by cutting (“restricting”) a DNA molecule with arestriction endonuclease. Many hundreds of such enzymes have beenisolated, as naturally made by bacteria. In essence, bacteria use suchenzymes as a defensive system, to recognize and then cleave (restrict)any foreign DNA molecules that might enter the bacterial cell (e.g., aviral infection). Each of the many hundreds of different restrictionenzymes has been found to cut (i.e., “cleave” or “restrict”) DNA at adifferent sequence of the 4 basic nucleotides (A, T, G, C) that make upall DNA molecules, e.g., one enzyme might specifically and onlyrecognize the sequence A-A-T-G-A-C, while another might specifically andonly recognize the sequence G-T-A-C-T-A, etc. Depending on the uniqueenzyme involved, such recognition sequences may vary in length, from asfew as 4 nucleotides to as many as 21 nucleotides. The larger therecognition sequence, the fewer restriction fragments will result, asthe larger the recognition site, the lower the probability that it willrepeatedly be found throughout the DNA.

Following the digestion, the resultant individual fragments areseparated from one another based on their size. Any method suitable forseparating DNA is encompassed by the methods of the present invention,including, but not limited to, gel electrophoresis, high performanceliquid chromatography (HPLC), mass spectroscopy, and use of amicrofluidic device. In one embodiment, the DNA fragments are separatedby agarose gel electrophoresis. Gel electrophoresis separates differentsized charged molecules by their rate of movement through a stationarygel under the influence of an electric current. These separated DNAfragments can easily be visualized, for example, by staining withethidium bromide and by viewing the gel under UV illumination. Thebanding pattern reflects the sizes of the restriction digested DNA.

Alternatively to performing RFLP on the amplified CRISPR locus, thesequence of the amplified DNA may be obtained by any method known in theart, including automatic and manual sequencing methods. See, forexample, Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual(2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.; Roe etal. (1996) DNA Isolation and Sequencing (Essential Techniques Series,John Wiley & Sons).

Other methods that utilize the novel CRISPR repetitive regions of theinvention to detect and/or type Lactobacillus strains are alsoencompassed by the invention. These methods include hybridizationmethods, either using a nucleic acid molecule of the invention as aprobe, or a nucleic acid molecule capable of hybridizing to a disclosednucleotide sequence of the present invention. See, for example, Sambrooket al. (1989) Molecular Cloning: Laboratory Manual (2d ed., Cold SpringHarbor Laboratory Press, Plainview, N.Y.).

In hybridization techniques, the hybridization probe(s) may be genomicDNA fragments, PCR-amplified products, or other oligonucleotides, andmay comprise all or part of a known nucleotide sequence disclosedherein. In addition, it may be labeled with a detectable group such as³²P, or any other detectable marker, such as other radioisotopes, afluorescent compound, an enzyme, or an enzyme co-factor. The term“labeled,” with regard to the probe, is intended to encompass directlabeling of the probe by coupling (i.e., physically linking) adetectable substance to the probe, as well as indirect labeling of theprobe by reactivity with another reagent that is directly labeled.Examples of indirect labeling include end-labeling of a DNA probe withbiotin such that it can be detected with fluorescently labeledstreptavidin.

Probes for hybridization can be made by labeling syntheticoligonucleotides based on the known CRISPR region nucleotide sequencesdisclosed herein. In one embodiment the entire L. acidophilus CRISPRregion nucleotide sequence (SEQ ID NO: 1) is used as a probe to detectand/or differentiate an L. acidophilus strain. In another embodiment,the probe is a fragment of a nucleotide sequence disclosed herein, suchas a probe consisting of a single repetitive sequence, as found in anyof SEQ ID NOS:2-7 and 37-48. In yet another embodiment, the probe is asequence found in a spacer region, for example in any of SEQ IDNOS:8-35. In another embodiment, the probe is a flanking region, such asthat of SEQ ID NO:36. The hybridization probe typically comprises aregion of nucleotide sequence that hybridizes under stringent conditionsto at least about 10, preferably about 20, more preferably about 50, 75,100, 125, 150, 175, 200, 250, 300, 350, or 400 consecutive nucleotidesof a CRISPR region nucleotide sequence of the invention or a fragment orvariant thereof. Preparation of probes for hybridization is generallyknown in the art and is disclosed in Sambrook et al. (1989) MolecularCloning: A Laboratory Manual (2d ed., Cold Spring Harbor LaboratoryPress, Plainview, N.Y.), herein incorporated by reference.

Substantially identical sequences will hybridize to each other understringent conditions. By “stringent conditions” is intended conditionsunder which a probe will hybridize to its target sequence to adetectably greater degree than to other sequences (e.g., at least 2-foldover background). Stringent conditions are known in the art and can befound in Current Protocols in Molecular Biology (John Wiley & Sons, NewYork (1989)), 6.3.1-6.3.6.

When using probes, stringent conditions will be those in which the saltconcentration is less than about 1.5 M Na ion, typically about 0.01 to1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and thetemperature is at least about 30° C. for short probes (e.g., 10 to 50nucleotides) and at least about 60° C. for long probes (e.g., greaterthan 50 nucleotides).

The post-hybridization washes are instrumental in controllingspecificity. The two critical factors are ionic strength and temperatureof the final wash solution. For the detection of sequences thathybridize to a full-length or approximately full-length target sequence,the temperature under stringent conditions is selected to be about 5° C.lower than the thermal melting point (T_(m)) for the specific sequenceat a defined ionic strength and pH. However, stringent conditions wouldencompass temperatures in the range of 1° C. to 20° C. lower than theT_(m), depending on the desired degree of stringency as otherwisequalified herein. For DNA-DNA hybrids, the T_(m) can be determined usingthe equation of Meinkoth and Wahl (1984) Anal. Biochem. 138:267-284:T_(m)=81.5° C.+16.6 (log M)+0.41 (% GC)−0.61 (% form)−500/L; where M isthe molarity of monovalent cations, % GC is the percentage of guanosineand cytosine nucleotides in the DNA, % form is the percentage offormamide in the hybridization solution, and L is the length of thehybrid in base pairs. The T_(m) is the temperature (under defined ionicstrength and pH) at which 50% of a complementary target sequencehybridizes to a perfectly matched probe.

The ability to detect sequences with varying degrees of homology can beobtained by varying the stringency of the hybridization and/or washingconditions. To target sequences that are 100% identical (homologousprobing), stringency conditions must be obtained that do not allowmismatching. By allowing mismatching of nucleotide residues to occur,sequences with a lower degree of similarity can be detected(heterologous probing). For every 1% of mismatching, the T_(m) isreduced about 1° C.; therefore, hybridization and/or wash conditions canbe manipulated to allow hybridization of sequences of a targetpercentage identity. For example, if sequences with ≧90% sequenceidentity are preferred, the T_(m) can be decreased by 10° C.

Exemplary low stringency conditions include hybridization with a buffersolution of 30-35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulfate)at 37° C., and a wash in 1× to 2×SSC (20×SSC=3.0 M NaCl/0.3 M trisodiumcitrate) at 50 to 55° C. Exemplary moderate stringency conditionsinclude hybridization in 40 to 45% formamide, 1.0 M NaCl, 1% SDS at 37°C., and a wash in 0.5× to 1×SSC at 55 to 60° C. Exemplary highstringency conditions include hybridization in 50% formamide, 1 M NaCl,1% SDS at 37° C., and a wash in 0.1×SSC at 60 to 65° C. Optionally, washbuffers may comprise about 0.1% to about 1% SDS. Duration ofhybridization is generally less than about 24 hours, usually about 4 toabout 12 hours. An extensive guide to the hybridization of nucleic acidsis found in Tijssen (1993) Laboratory Techniques in Biochemistry andMolecular Biology—Hybridization with Nucleic Acid Probes, Part I,Chapter 2 (Elsevier, New York); and Ausubel et al., eds. (1995) CurrentProtocols in Molecular Biology, Chapter 2 (Greene Publishing andWiley-Interscience, New York). See Sambrook et al. (1989) MolecularCloning: A Laboratory Manual (2d ed., Cold Spring Harbor LaboratoryPress, Plainview, N.Y.).

Methods that encompass hybridization techniques to detect ordifferentiate bacterial strains are also encompassed. These include, butare not limited to, Southern blotting (see, for example, Van Embden etal. (1993) J. Clin. Microbiol. 31:406-409), shift mobility assays (see,for example, U.S. Published Application No. 20030219778), sequencingassays using oligonucleotide arrays (see, for example, Pease et al.(1994) Proc. Natl. Acad. Sci. USA 91:5022-5026), spoligotyping (see, forexample, Kamerbeek et al. (1997) J. Clin. Microbiol. 35:907-914),Flourescent In Situ Hybridization (FISH) (see, for example, Amann et al.(1990) J. Bacteriol. 172:762-770) and heteroduplex tracking assays orheteroduplex mobility analysis (see, for example, White et al. (2000) J.Clin. Micro. 38:477-482).

The invention also encompasses kits for detecting the presence of thenucleic acids of the present invention in a sample. Such kits can beused for typing or detection of Lactobacillus strains present in, forexample, a food product or starter culture, or in a subject that hasconsumed a probiotic material. For example, the kit may comprise PCRprimers for amplification of a CRISPR locus, as well as a polymerase andother PCR materials for use in DNA amplification. The kit may alsocontain one or more restriction enzymes for use in RFLP analysis. Thekit may contain a labeled compound or agent capable of detecting adisclosed nucleic acid sequence in a sample and means for determiningthe amount of a the disclosed nucleic acid sequence in the sample (e.g.,an oligonucleotide probe that binds to a nucleic acid sequence of theinvention, e.g., any of SEQ ID NOS:1-50).

For oligonucleotide-based kits, the kit may comprise, for example: (1)an oligonucleotide, e.g., a detectably labeled oligonucleotide, thathybridizes to a disclosed nucleic acid sequence, or (2) a pair ofprimers useful for amplifying a disclosed nucleic acid molecule.

The kit may also comprise, e.g., a buffering agent, a preservative, or aprotein-stabilizing agent. The kit may also comprise componentsnecessary for detecting the detectable agent (e.g., an enzyme or asubstrate). The kit may also contain a control sample or a series ofcontrol samples (both positive and negative) that can be assayed andcompared to the test sample contained. Each component of the kit isusually enclosed within an individual container, and all of the variouscontainers are within a single package along with instructions for use.

In one embodiment, the kit comprises multiple probes in an array format,such as those described, for example, in U.S. Pat. Nos. 5,412,087 and5,545,531, and International Publication No. WO 95/00530, hereinincorporated by reference. Probes for use in the array may besynthesized either directly onto the surface of the array, as disclosedin International Publication No. WO 95/00530, or prior to immobilizationonto the array surface (Gait, ed. (1984) Oligonucleotide Synthesis aPractical Approach (IRL Press, Oxford, England). The probes may beimmobilized onto the surface using techniques well known to one of skillin the art, such as those described in U.S. Pat. No. 5,412,087. Probesmay be a nucleic acid or peptide sequence, preferably purified.

The arrays may be used to screen organisms, samples, or products todifferentiate between Lactobacillus strains, or to verify the presenceof a Lactobacillus species, such as L. acidophilus NCFM®. Binding to acapture probe is detected, for example, by signal generated from a labelattached to the nucleic acid molecule comprising the disclosed nucleicacid sequence. The method can include contacting the molecule comprisingthe disclosed nucleic acid with a first array having a plurality ofcapture probes and a second array having a different plurality ofcapture probes. The results of each hybridization can be compared toanalyze differences in the content between a first and second sample.The first plurality of capture probes can be from a control sample,e.g., a sample known to contain L. acidophilus NCFM®, or controlsubject, e.g., a food, including an animal feed or animal feedsupplement, a dietary supplement, a starter culture sample, or abiological fluid. The second plurality of capture probes can be from anexperimental sample, e.g., a subject that has consumed a probioticmaterial, a starter culture sample, a food, or a biological fluid.

These assays may be especially useful in microbial selection and qualitycontrol procedures where the detection of unwanted materials isessential. The detection of particular nucleotide sequences may also beuseful in determining the genetic composition of food, fermentationproducts, or industrial microbes, or microbes present in the digestivesystem of animals or humans that have consumed probiotics.

Fragments and Variants

The invention includes isolated nucleic acid molecules comprising thenucleotide sequence of a CRISPR locus from L. acidophilus, L. brevis, L.casei, L. delbrueckii, or variants and fragments thereof. By “fragment”of a nucleic acid molecule is intended a portion of the nucleotidesequence. Fragments of nucleic acid molecules can be used ashybridization probes to detect and/or differentiate CRISPR regions fromvarious bacteria, including Lactobacillus species, or can be used asprimers in PCR amplification of CRISPR regions. Fragments of nucleicacids can also be bound to a physical substrate to comprise what may beconsidered a macro- or microarray (for example, U.S. Pat. Nos. 5,837,832and 5,861,242). Such arrays of nucleic acids may be used to identifynucleic acid molecules with sufficient identity to the target sequences.By “nucleic acid molecule” is intended DNA molecules (e.g., cDNA orgenomic DNA) and RNA molecules (e.g., mRNA) and analogs of the DNA orRNA generated using nucleotide analogs. The nucleic acid molecule can besingle-stranded or double-stranded, but preferably is double-strandedDNA. A nucleotide fragment may be used as a hybridization probe or PCRprimer as described above. Fragments of CRISPR region nucleic acidmolecules comprise at least about 15, 20, 50, 75, 100, 200, 250, 300,350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000,1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900 nucleotides or upto the total number of nucleotides present in a full-length CRISPRregion nucleotide sequence as disclosed herein (for example, 1953 forSEQ ID NO:1).

Variants of the nucleotide sequences are encompassed in the presentinvention.

By “variant” is intended a sufficiently identical sequence. Accordingly,the invention encompasses isolated nucleic acid molecules that aresufficiently identical to any of the nucleotide sequences of SEQ IDNOS:1-50, or nucleic acid molecules that hybridize to any of thenucleotide sequences of SEQ ID NOS:1-50, or a complement thereof, understringent conditions.

In general, nucleotide sequences that have at least about 45%, 55%, or65% identity, preferably at least about 70% or 75% identity, morepreferably at least about 78%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%,88%, 89%, or 90%, most preferably at least about 91%, 92%, 93%, 94%,95%, 96%, 97%, 98%, or 99% sequence identity to any of the nucleotidesequences of SEQ ID NOS:1-50, are defined herein as sufficientlyidentical.

Naturally occurring variants may exist within a population (e.g., the L.acidophilus population). Such variants can be identified by usingwell-known molecular biology techniques, such as PCR, and hybridizationas described above. Synthetically derived nucleotide sequences, forexample, sequences generated by site-directed mutagenesis orPCR-mediated mutagenesis, that still allow strain differentiation ordetection, are also included as variants. One or more nucleotidesubstitutions, additions, or deletions can be introduced into anucleotide sequence disclosed herein, such that the substitutions,additions, or deletions do not affect the ability to differentiatestrains based on any of the methods disclosed herein or known in theart, including, but not limited to RFLP, sequencing, and hybridization.Examples of variants of a CRISPR repeat region can be found in SEQ IDNOS:2-7 and 37-48.

Sequence Identity

The nucleotide sequences encompassed by the present invention have acertain sequence identity. By “sequence identity” is intended thenucleotide residues that are the same when aligning two sequences formaximum correspondence over a specified comparison window. By“comparison window” is intended a contiguous segment of the twonucleotide sequences for optimal alignment, wherein the second sequencemay contain additions or deletions (i.e., gaps) as compared to the firstsequence. Generally, for nucleic acid alignments, the comparison windowis at least 20 contiguous nucleotides in length, and optionally can be30, 40, 50, 100, or longer. Those of skill in the art understand that toavoid a high similarity due to inclusion of gaps, a gap penalty istypically introduced and is subtracted from the number of matches.

To determine the percent identity of two nucleotide sequences, analignment is performed. Percent identity of the two sequences is afunction of the number of identical residues shared by the two sequencesin the comparison window (i.e., percent identity=number of identicalresidues/total number of residues×100). In one embodiment, the sequencesare the same length. Methods similar to those mentioned below can beused to determine the percent identity between two sequences. Themethods can be used with or without allowing gaps.

Mathematical algorithms can be used to determine the percent identity oftwo sequences. Non-limiting examples of mathematical algorithms are thealgorithm of Karlin and Altschul (1990) Proc. Natl. Acad. Sci. USA87:2264, modified as in Karlin and Altschul (1993) Proc. Natl. Acad.Sci. USA 90:5873-5877; the algorithm of Myers and Miller (1988) CABIOS4:11-17; the local homology algorithm of Smith et al. (1981) Adv. Appl.Math. 2:482; the global alignment algorithm of Needleman and Wunsch(1970) J. Mol. Biol. 48:443-453; and the search-for-localalignment-method of Pearson and Lipman (1988) Proc. Natl. Acad. Sci. USA85:2444-2448.

Various computer implementations based on these mathematical algorithmshave been designed to enable the determination of sequence identity. TheBLAST programs of Altschul et al. (1990) J. Mol. Biol. 215:403 are basedon the algorithm of Karlin and Altschul (1990) supra. Searches to obtainnucleotide sequences that are homologous to nucleotide sequences of thepresent invention can be performed with the BLASTN program, score=100,wordlength=12. Gapped alignments may be obtained by using Gapped BLAST(in BLAST 2.0) as described in Altschul et al. (1997) Nucleic Acids Res.25:3389. To detect distant relationships between molecules, PSI-BLASTcan be used. See, Altschul et al. (1997) supra. For all of the BLASTprograms, the default parameters of the respective programs can be used.See, www.ncbi.nlm.nih.gov. Alignment may also be performed manually byinspection.

Another program that can be used to determine percent sequence identityis the ALIGN program (version 2.0), which uses the mathematicalalgorithm of Myers and Miller (1988) supra. In addition to the ALIGN andBLAST programs, the BESTFIT, GAP, FASTA and TFASTA programs are part ofthe GCG Wisconsin Genetics Software Package, Version 10 (available fromAccelrys Inc., 9685 Scranton Rd., San Diego, Calif., USA), and can beused for performing sequence alignments. The preferred program is GAPversion 10, which used the algorithm of Needleman and Wunsch (1970)supra. Unless otherwise stated the sequence identity values providedherein refer to those values obtained by using GAP Version 10 with thefollowing parameters: % identity using GAP Weight of 50 and LengthWeight of 3 and the nwsgapdna.cmp scoring matrix. Other equivalentprograms may also be used. By “equivalent program” is intended anysequence comparison program that, for any two sequences in question,generates an alignment having identical nucleotide residue matches andan identical percent sequence identity when compared to thecorresponding alignment generated by GAP Version 10.

The following examples are offered by way of illustration and not by wayof limitation.

EXPERIMENTAL Example 1 DNA Analysis

The genomic DNA sequence from Lactobacillus acidophilus NCFM® wasanalyzed for repetitive DNA by a “repeat and match analysis” usingApplied Maths' Kodon software package. One intergenic region between DNApolymerase I (polA) (ORF 1550) and a putativephosphoribosylamine-glycine ligase (purD) (ORF 1551) was identified ashaving features characteristic of a CRISPR locus. This region isapproximately 2.4 kb long and contains 32 nearly perfect repeats of 29base pairs separated by 32 base pair spacers (see FIGS. 1 and 2).

A number of features of the CRISPR region can be seen in FIG. 2. The 29base pair repeats are highlighted. The first nucleotide of the repeat iseither an A or a G. The last nucleotide of the repeat changes from a Tto a C at repeat number 21. An imperfect inverted repeat is indicated byan underline on the last repeat. The first repeat contains two A→T basesubstitutions. The 26^(th) repeat contains one C→T base substitution.Two sequences are repeated in the spacer region; one is repeated twice(bolded and outlined) and one is repeated three times (bolded and caps).The 16^(th) spacer region is one base longer than the others.

Example 2 PCR of Intergenic Region

Primers were designed to amplify the entire intergenic region betweenpolA and purD (expected product size=2582 base pairs). The primers wereas follows:

1550_F- (SEQ ID NO: 49) 5′ GCA TTA GTG TGC AAC CCA TCT GG 3′ 1551_R-(SEQ ID NO: 50) 5′ GAT CTG CTG GAT TGC TTC TAC CG 3′

A PCR reaction mix was set up for each reaction (25.0 μl of AccuPrimeSuperMix II (2× conc.); 1.0 μl of each primer (20 μM); 1 μl of template(300 ng/μl); H₂O to 50.0 μl). The reaction conditions were as follows: Icycle at 95° C. for 5 minutes; 40 cycles with a first step at 95° C. for30 seconds, a second step at 54° C. for 30 seconds, and a third step at68° C. for 3 minutes; 1 cycle at 68° C. for 7 minutes.

This PCR was performed on sixteen L. acidophilus strains. All L.acidophilus strains that had previously been shown to be identical to L.acidophilus NCFM® by other means (i.e., PFGE, Microarrays, 16Ssequencing, etc.) generated the same size PCR amplicon. Three strainsthat had previously been shown to be different from NCFM® (ATCC 4356,ATCC 4357, and Strain B) exhibited different sized amplicons. Strains ofLactobacillus helveticus, Lactobacillus gasseri, and Lactobacillusplantarum that were tested did not generate a PCR product.

Four strains were found that did not generate a PCR product: L.acidophilus ATCC 521, L. acidophilus strain F, L. acidophilus strain G,and L. acidophilus strain H. These strains were sent to MIDI Labs foridentification and were identified as follows:

L. acidophilus ATCC 521 L. helveticus L. acidophilus strain FPediococcus parvulus L. acidophilus strain G L. gasseri L. acidophilusstrain H L. plantarum

The PCR results for 6 strains are shown in FIG. 3A. The different sizedbands indicated that there were significant differences in the CRISPRregion of some strains.

Example 3 PCR Amplification Method is Specific for Lactobacillusacidophilus Detection

PCR was performed on 23 bacterial samples as described in Example 2. PCRamplification of all L. acidophilus strains tested resulted in a PCRamplicon, whereas all other species tested did not (see FIG. 4). Thespecies of all tested strains were confirmed using 16S sequencing.Therefore, this method is specific for L. acidophilus.

Example 4 Restriction Digestion of Intergenic Region

In order to generate more discriminatory patterns for each strain, theCRISPR PCR products were subjected to restriction digestion with threeenzymes that generated between 10 and 24 bands: AluI—10 bands; MseI—19bands; Tsp509I—24 bands.

AluI: Six CRISPR PCR products were digested with AluI and separated on a2% agarose gel (FIG. 3B). Three strains exhibited a difference inbanding pattern, ATCC 4356, ATCC 4357, and strain B. These results arein agreement with the results of other tests (Microarray,Transposase-PCR, PFGE) that indicate these three strains are unique(data not shown).

MseI: Six CRISPR PCR products were digested with MseI and separated on a3% agarose gel (FIG. 3C).

Tsp509I: Six CRISPR PCR products were digested with Tsp509I andseparated on a 3% agarose gel (FIG. 3D).

Example 5 PCR Amplification Followed by Enzymatic Digestion canDifferentiate L. acidophilus Strains

Fourteen L. acidophilus strains were subjected to both CRISPR locusamplification and restriction enzyme digestion as described in Examples2 and 4. Seven distinct band patterns were generated, indicating thatthis method can differentiate between strains (see FIG. 5).

Example 6 PCR/Digestion Products Match PFGE Results

PFGE was performed on the fourteen L. acidophilus strains discussed inExample 5. The PFGE results confirmed those obtained by using thePCR/Digestion Method as described in Examples 2-5 (see FIG. 6). NCFM®and Lac-1 strains showed identical PFGE and PCR/Digestion results, butdiffered from Lac-3 and ATCC4356.

Example 7 Identification of CRISPR Regions in Other LactobacillusSpecies

Other Lactobacillus species were analyzed for CRISPR sequences asdescribed in Example 1. CRISPR sequences were found in L. brevis, L.casei and L. delbrueckii ssp. bulgaricus. The repeat sequences are shownin FIG. 7, with variant nucleotides shown below the main sequences.Within the regions analyzed, 32 repeats were present in L. acidophilus,12 repeats were present in L. brevis, 21 repeats were present in L.casei, and 17 repeats were present in L. delbrueckii ssp. bulgaricus.

Example 8 Strain Typing of Lactabacillus Species

Primers are designed to amplify the entire CRISPR region of L.delbrueckii ssp. bulgaricus. A PCR reaction mixture is set up and PCR isperformed on ten L. delbrueckii ssp. bulgaricus strains, as described inExample 2. The PCR products are subjected to restriction digestion withAluI, MseI, and Tsp509I as described in Example 4. The DNA is separatedby gel electrophoresis and the band patterns are analyzed. Detection ofdifferent band patterns indicates the presence of different strains ofL. delbrueckii ssp. bulgaricus.

Conclusions: The identification of a unique CRISPR region in NCFM® is apromising discovery for the development of detection and differentiationmethods. Of 20 strains designated as L. acidophilus tested, 16 generateda CRISPR-PCR fragment with the designed primers. The four strains forwhich no fragment was amplified were confirmed by MIDI Labs as beingmisidentified—strengthening the position of this CRISPR locus as beingL. acidophilus specific. The remaining 16 strains were subjected torestriction analysis of the CRISPR-PCR fragment revealing 12 strainswith identical restriction patterns and 3 strains with unique patterns.These results are supported by data that has been generatedindependently by comparative genome microarray analysis, transposase-PCRanalysis, and PFGE.

In summary, a relatively quick and easy CRISPR-PCR/restriction analysisgenerated unique fragmentation patterns for the truly different L.acidophilus strains tested. The method can also be applied in otherLactobacillus species, including L. brevis, L. casei, and L.delbrueckii.

All publications and patent applications mentioned in the specificationare indicative of the level of skill of those skilled in the art towhich this invention pertains. All publications and patent applicationsare herein incorporated by reference to the same extent as if eachindividual publication or patent application was specifically andindividually indicated to be incorporated by reference.

Although the foregoing invention has been described in some detail byway of illustration and example for purposes of clarity ofunderstanding, it will be obvious that certain changes and modificationsmay be practiced within the scope of the appended claims.

1.-20. (canceled)
 21. An isolated nucleic acid molecule selected fromthe group consisting of: a) a nucleic acid molecule comprising thenucleotide sequence of SEQ ID NO: 1, or a complement thereof; b) anucleic acid molecule comprising a nucleotide sequence having at least75% sequence identity to the nucleotide sequence of SEQ ID NO: 1, or acomplement thereof; c) a nucleic acid molecule comprising 1-140 repeatsof at least one of the nucleotide sequences set forth in SEQ ID NO:2,SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ IDNO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ IDNO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ IDNO:47, SEQ ID NO:48, or a variant thereof.